Non-record: hybrid spiking Transformer (SNN)with a multi-step spiking MLP#664
Open
tsbiosky wants to merge 1 commit intoopenai:mainfrom
Open
Non-record: hybrid spiking Transformer (SNN)with a multi-step spiking MLP#664tsbiosky wants to merge 1 commit intoopenai:mainfrom
tsbiosky wants to merge 1 commit intoopenai:mainfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hybrid Spiking Neural Networks (SNNs) MLP
val_bpb: 1.2982 | 15.78 MB | 8×H100 SXM
A contest-friendly hybrid SNN submission built from the
train_gpt.pybaseline: keep dense GQA attention and the original training/eval/compression pipeline, but replace the standard feed-forward block with a small multi-step leaky integrate-and-fire (LIF-style) spiking MLP.Reference :https://arxiv.org/pdf/2203.14679
Why this is interesting
This is not a fully spiking language model. It is a hybrid Transformer + SNN-MLP design:
That makes the experiment meaningful for the contest setting because it isolates one question: